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Annotation of regulatory elements and identification of the transcription-related factors (TRFs) targeting these elements 
are key steps in understanding how cells interpret their genetic blueprint and their environment during development, and 
how that process goes awry in the case of disease. One goal of the modENCODE [model organism ENCycIopedia of DNA 
Elements) Project is to survey a diverse sampling of TRFs, both DNA-binding and non-DNA-binding factors, to provide 
a framework for the subsequent study of the mechanisms by which transcriptional regulators target the genome. Here we 
provide an updated map of the Drosophila melanogaster regulatory genome based on the location of 84 TRFs at various stages 
of development. This regulatory map reveals a variety of genomic targeting patterns, including factors with strong 
preferences toward proximal promoter binding, factors that target intergenic and intronic DNA, and factors with distinct 
chromatin state preferences. The data also highlight the stringency of the Polycomb regulatory network, and show as- 
sociation of the Trithorax-Iike (Trl) protein with hotspots of DNA binding throughout development. Furthermore, the 
data identify more than 5800 instances in which TRFs target DNA regions with demonstrated enhancer activity. Regions of 
high TRF co-occupancy are more likely to be associated with open enhancers used across cell types, while lower TRF 
occupancy regions are associated with complex enhancers that are also regulated at the epigenetic level. Together these 
data serve as a resource for the research community in the continued effort to dissect transcriptional regulatory mech- 
anisms directing Drosophila development. 



[Supplemental material is available for this article.] 

Whole-genome sequencing has become increasingly straightfor- 
ward in recent years, though our ability to interpret these genomes 
is still far from complete. Understanding the regulatory genome — the 
noncoding portion of the genome dictating where, when, and to 
what level genes are expressed — remains a significant challenge. 
Annotation and characterization of regulatory elements is espe- 
cially important for metazoan organisms, where complex three- 
dimensional body plans consisting of many cell types are ultimately 
derived from the genomic blueprint of a single zygote. Drosophila 
melanogaster has been at the forefront of the biology of transcrip- 
tional regulation for decades, with polytene chromosome studies 
providing some of our earliest glimpses into genome-wide gene 
regulatory and protein-DNA interactions (Ritossa 1964; Ashburner 
1967, 1970; Silver and Elgin 1976; Jamrich et al. 1977; Andrew and 
Scott 1994). Drosophila continues and will remain to be a valuable 
model for developmental gene regulation due to its ease of genetic 
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manipulation and plethora of comparative genomics resources 
(Slattery et al. 2012). 

Several pioneering studies have provided a genome-wide view 
into aspects of the Drosophila regulatory genome; from RNA 
polymerase II to Polycomb-Response Elements, insulator ele- 
ments, chromatin states, transcription factors, and even the con- 
formational architecture of nucleus (Negre et al. 2006, 2010, 2011; 
Zeitlinger et al. 2007; Kwong et al. 2008; Li et al. 2008; Filion et al. 
2010; Kharchenko et al. 2011; Sexton et al. 2012; Slattery et al. 
2012; White 2012). Several recent reviews provide an excellent 
synthesis of many of these studies (Biggin 2011; Delest et al. 2012; 
Lelli et al. 2012; Ong and Corces 2012; Slattery et al. 2012; Spitz 
and Furlong 2012; White 2012). Briefly, the picture emerging from 
these studies is one of a genome organized into distinct chromatin 
types, roughly separable into 'active' and 'repressive' states, phys- 
ically separated from one another within the three-dimensional 
nucleus; in some cases regulatory functions are located in spatially 
discrete sites (e.g., transcription factories and Polycomb bodies) 
(van Steensel 2011; Delest et al. 2012). Within this chromatin 
environment, binding of transcriptional regulators to DNA is 
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seemingly widespread, at least relative to the expected number of 
direct regulatory target genes for most TRFs, and often highly 
overlapping (i.e., highly occupied target [HOT] regions) (Moorman 
et al. 2006; Li et al. 2008; Negre et al. 2011). Higher-order chro- 
matin structure and spatially discrete regions of high TRF concen- 
tration likely impact TRF-binding patterns genome wide, although 
the local chromatin landscape is also influential because DNA ac- 
cessibility is a major determinant of TRF binding (MacArthur et al. 
2009; Guertin and Lis 2010; Li et al. 2011b). Though far from 
complete, annotation of the Drosophila regulatory genome has 
begun thanks to these studies, and focused genetic and genomic 
approaches are being used to address mechanistic questions 
stemming from them (Guertin and Lis 2010; Kvon et al. 2012). 

The importance of focused studies of gene regulatory net- 
works — studies exploring panels of TRFs working within the same 
network, or studies exploring the impact of cellular context on 
TRF-DNA interactions — cannot be overstated. The goal of the 
modENCODE (mod el organism ENCyclopedia of DNA Elements) 
Project, however, is to survey a diverse sampling of transcriptional 
regulators, both DNA-binding and non-DNA-binding factors, to 
get a broad view of the patterns by which transcriptional regulators 
target the genome. Here we provide an updated map of the Dro- 
sophila melanogaster regulatory genome based on the location of 
localization of 84 transcriptional regulators at various stages of 
development. 

Results 

We describe in vivo genome-wide binding patterns of 84 
Drosophila transcriptional regulatory factors (TRFs), using both 
previously published data sets and new data generated by the 
modENCODE Project (Li et al. 2008; Zinzen et al. 2009; Negre 
et al. 2011; Supplemental Table SI). Overall, 65 DNA-binding pro- 
teins and 19 non-DNA-binding proteins (cofactors, chromatin- 
binding factors) are represented; most factors were tested in one 
developmental stage or cell line, though multiple factors were tested 
in multiple contexts (Supplemental Table SI). In total, these data, 
all of which are available through the modENCODE Data Co- 
ordination Center (http://intermine.modencode.org/) or Gene Ex- 
pression Omnibus (GEO; see Supplemental Table SI for accession 
numbers), represent 171 separate genome-wide ChIP experiments 
performed in duplicate or greater, 413,743 TRF-binding sites, and 
50,336 unique TRF-binding regions. 

Genomic features targeted by transcriptional regulators 

Gene expression is controlled by regulatory DNA sequences and 
the TRFs that interact with these sequences. A significant amount 
of regulatory information is found immediately upstream of 
transcription start sites in gene promoter regions. However, the 
complex gene expression patterns of metazoans often require ad- 
ditional, combinatorial input from distal regulatory sequences 
known as enhancers, or ris-regulatory modules (CRMs) (Lelli et al. 
2012; Spitz and Furlong 2012). To get an overall view of the ge- 
nomic features bound by the transcriptional regulators studied, 
we characterized binding events as falling into one of the following 
categories: promoter proximal, intergenic, intron, exon, or down- 
stream (Supplemental Fig. SI). The binding of most general regu- 
lators (HDACs and chromatin remodelers) is highly biased toward 
promoter-proximal binding, and the majority of site-specific DNA- 
binding TRFs also display significant promoter-proximal binding, 
although there is a substantial subset that prefers distal binding 



sites. For instance, the developmental regulators Eve, Hth, Pan 
(also known as dTCF), EcR, and USP (Harding et al. 1986; Yao et al. 
1992; Mann et al. 2009; Archbold et al. 2012) all bind intergenic or 
intronic DNA >50% of the time across multiple developmental 
stages, suggesting that these factors often act at distal enhancers 
(Supplemental Fig. SI). Thus, in these cases, global-binding pref- 
erences are consistent with TRF molecular function. 

Chromatin types targeted by transcriptional regulators 

Chromatin landscape has the potential to significantly influence 
the binding of transcriptional regulators to DNA, both through 
local and global influences on DNA accessibility. Two recent ge- 
nome-wide studies have annotated the Drosophila genome based 
on chromatin state. Despite using different cell types, experi- 
mental techniques, and chromatin factors these independent 
studies generated functionally consistent chromatin state maps 
(Filion et al. 2010; Roy et al. 2010; Kharchenko et al. 2011); the 
modENCODE model consists of nine chromatin states, while the 
Filion et al. (2010) model describes five chromatin states. For 
simplicity, we focus here on the five-state model, in which chro- 
matin states are assigned colors. YELLOW and RED chromatin are 
the two 'active' states, with the former generally associated with 
ubiquitous genes and the latter with patterned genes. BLACK, 
GREEN, and BLUE represent the three 'inactive' states; BLACK re- 
gions are relatively gene-poor, GREEN is associated with hetero- 
chromatin, and BLUE is associated with Pc-mediated silencing. 
Importantly, although the five-state model is based on data from 
Kcl67 cells, our developmental timecourse of chromatin modifi- 
cations reveals that many features captured in this cell line are 
consistent throughout development; for example, BLUE chroma- 
tin is always marked with repressive histone modifications and 
YELLOW and RED chromatin is always associated with active 
histone modifications (Filion et al. 2010; Negre et al. 2011; Sup- 
plemental Fig. S2). The consistency of these global trends indicates 
that the five-state model is relevant to many developmental stages. 
We sought to explore the relationship between these chromatin 
states and transcriptional regulator binding patterns (Filion et al. 
2010). Thus, for each TRF, we looked at the fraction of binding 
events that fall into each of the five chromatin types. 

Hierarchical clustering of TRF binding across chromatin states 
reveals three primary chromatin-type preferences. The largest 
cluster consists of TRFs that primarily bind DNA in the YELLOW 
'active' chromatin state. This cluster consists of many of the gen- 
eral factors described above as promoter-associated (HDACs, etc.) 
(Fig. 1A). However, a number of DNA-binding factors such as Ttk, 
Hr78, and Eip74EF also preferentially bind YELLOW chromatin. 
TRFs falling into the RED cluster are almost entirely DNA-binding 
factors including Trl (also known as GAGA factor, or GAF), a mul- 
tifunctional regulator of gene expression, and a number of fac- 
tors that drive tissue-specific patterns of expression, such as Pan, 
EcR, USP, as well as many of the mesodermal TRFs (Mef2, Twi, 
Bin, etc.) (Fig. 1A; Yao et al. 1992; Ciglar and Furlong 2009; 
Archbold et al. 2012). 

There are three 'inactive' chromatin states according to the 
five-state model: BLACK, BLUE, and GREEN. GREEN chromatin is 
repressive heterochromatin, primarily pericentric, and is targeted 
by HP1 and little else (Fig. 1A). However, we observed that a large 
proportion of TRFs target BLACK and BLUE chromatin. BLACK 
chromatin covers approximately half of the genome and is re- 
latively gene-poor; the genes that are associated with it are gen- 
erally expressed in a tissue-specific manner. BLUE chromatin is 
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Figure 1. Transcriptional regulator overlap with chromatin states. (A) Heatmap representing the fraction of TRF binding overlapping chromatin states as 
defined by Filion et al. (201 0). (B) TRFs with >35% binding in BLUE/Pc associated chromatin. TRFs shaded dark blue are Pc targets based on data from Kwong 
et al. (2008) and Filion et al. (2010), and TRFs shaded light blue are Pc targets based on data from Kwong et al. (2008). See also Supplemental Table S2. 
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Figure 2. TRF-TRF colocalization matrix. (A) Significance of binding site overlap for pairwise TRF-TRF comparisons; shading represents Z-score en- 
richment (red) or depletion (blue). (B) Pan binding across chromosome 3R (~1 4 Mb), and the Notum, naked cuticle (nkd), and vestigial (vg) loci at early 
embryonic, late embryonic, and larval developmental stages. See also Supplemental Table S3. 
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Pc-targeted repressive chromatin. The Class II insulator protein 
Su(Hw) (Negre et al. 2010) has a strong preference for BLACK 
chromatin across developmental contexts (early embryo, late 
embryo, and white prepupal stages). Many important regulators of 
early developmental patterning (Bed, Ftz, H, Slpl, etc.) preferen- 
tially bind BLACK and BLUE chromatin to approximately the same 
degree (Fig. 1A; Levine and Davidson 2005; Sen et al. 2010). Finally 
a set of developmental TRFs including Kni, Til, Hth, Gro, Tin, Kr, 
and Cad (Sauer et al. 1996; Levine and Davidson 2005; Ciglar and 
Furlong 2009; Mann et al. 2009) preferentially binds BLUE chro- 
matin over all other types (Fig. 1A). 

Interestingly, many of the TRFs that preferentially bind Pc- 
targeted DNA regions — BLUE chromatin — are involved in pro- 
viding cell, tissue, or regional identities; these factors are often 
referred to as selector or selector-like genes in Drosophila (Garcia- 
Bellido 1975; Mann and Morata 2000). Because of their significant 
impact on cellular identity, the expression of selector-like genes 
must be precisely controlled. Therefore, we used the genome-wide 
ChIP data to ask whether TRFs that preferentially bind Pc-targeted 
DNA regions are Pc targets themselves. Indeed, of the 17 TRFs with 
>35% binding in BLUE chromatin, 14 (>80%) contain at least 100 
bp of BLUE chromatin within their gene units (i.e., Pc targeted) 
(Fig. IB). Further, two of the three exceptions have been identified 
as Pc targeted in another genome-wide study, and Pc indirectly 
regulates the lone factor that is not a direct Pc target (Stat92E) (see 
Discussion) (Kwong et al. 2008). These data indicate that many of 
the Pc-targeted, selector-like genes may fall within a partially self- 
contained Pc regulatory network. 

Coincident binding of transcriptional regulators 

Precise spatiotemporal patterns of gene expression require the in- 
tegration of multiple regulatory inputs from TRFs at enhancers. 
Interactions between transcriptional regulators at enhancers can 
be direct or indirect, and combined inputs can be additive, co- 
operative, or antagonistic (Slattery et al. 2011; Lelli et al. 2012; 
Spitz and Furlong 2012). To uncover potential coordinated regu- 
latory interactions we calculated the genome-wide binding corre- 
lations for all transcriptional regulator pairs to identify TRFs with 
similar binding profiles. Many putative interactions are evident, 
and expected relationships are clear from the overlap matrix (Fig. 
2A). For instance, EcR and USP are known to physically interact on 
regulatory enhancers, and these factors have similar binding pro- 
files at multiple stages of development (Fig. 2A; Yao et al. 1992). 
Similarly, binding of the transcriptional corepressor Gro overlaps 
significantly with Pan, consistent with previous studies, and we 
also find clusters of cobinding for early embryo TRFs and meso- 
derm TRFs (Fig. 2A; Cavallo et al. 1998; Li et al. 2008; Spitz and 
Furlong 2012). The corepressor Gro also overlaps significantly with 
Hth and Dll, indicating that these factors may use Gro when reg- 
ulating transcriptional repression, a possibility that has been sug- 
gested previously for Hth (Gebelein et al. 2004). In addition to 
these examples, a number of TRFs were tested at multiple stages of 
development (e.g., Trl, Pan, Cnc) and binding profiles from dif- 
ferent stages tend to show significant overlap (Fig. 2A). That is, the 
global trend for these factors is similar across development. How- 
ever, in these cases, there is still context-dependent binding varia- 
tion at potentially significant loci. For example, Pan binding is quite 
consistent across multiple developmental stages, but varies signifi- 
cantly at loci such as Notum, naked cuticle (nkd), and vestigial (vg), all 
previously characterized Pan targets (Fig. 2B; Klein and Arias 1999; 
Schweizer et al. 2003; Fang et al. 2006; Chang et al. 2008). 



Although many distinct, and expected, interactions are evi- 
dent, the highly overlapping nature of TRF binding is also clear in 
the cobinding matrix. In fact, 5692 regions are bound by at least 14 
factors (see Methods); in keeping with the previous literature, we 
refer to these hotspots of TRF localization as HOT (high-occupancy 
target) regions. Subtracting HOT regions from the calculation of 
TRF-TRF overlap significance leads to a more defined interaction 
matrix, often further highlighting the pairwise interactions de- 
scribed in the previous paragraph (Fig. 2A). 

We further explored the extensive TRF colocalization by 
looking for the TRFs most enriched for binding in HOT regions. 
Seven DNA-binding factors and seven non-DNA-binding factors 
were very highly enriched in HOT regions (Z-score >50, see 
Methods) (Fig. 3 A; Supplemental Table S4). With one excep- 
tion the experiments for all of these enriched factors overlap 
early- to mid-embryogenesis; the outlier is Tfl, which is enriched 
for HOT region targeting during mid-embyogenesis (8-16 h), late- 
embryogenesis (16-24 h), and in Kcl67 cells. An important role for 
Trl in the formation of HOT regions was suggested previously based 
on the overrepresentation of GAGA motifs — the Trl DNA-binding 
motif— across —2000 regions of high transcription-factor occu- 
pancy (Kvon et al. 2012). Indeed, we find that the GAGA motif is 
progressively enriched with increasing TRF occupancy (Fig. 3B,C; 
Supplemental Table S4). Interestingly, we find that Trl binding 
is remarkably consistent across development, with little varia- 
tion between ChlPs performed with embryo chromatin (8-16 h) 
and ChlPs performed with chromatin from dissected larval wing 
discs (third instar) (Fig. 3D). Indeed, binding in HOT regions is 
driving this context-independent Trl binding (Fig. 3E). Thus, al- 
though the majority of our data are from embryonic stages, the 
lack of developmental variation in Trl binding to HOT regions 
suggests that HOT regions might also remain consistent throughout 
development. 

Low-, medium-, and high-occupancy target regions 

Aside from high TRF co-occupancy, do HOT regions have any other 
distinguishing features? To address this question we separated 
binding regions into three categories: HOT, WARM, and COLD; 
HOT regions are targeted by 14 or more TRFs, WARM regions are 
targeted by 4-13 TRFs, and COLD regions are targeted by 1-3 TRFs 
(see Methods). Relative to COLD regions, HOT regions occupy 
a much smaller fraction of the intergenic genome (Fig. 4A). The 
reverse trend is seen in regions around transcription start sites. In 
both cases, the patterns of WARM-binding regions fall in the in- 
termediate range between HOT and COLD. The divergent patterns 
of HOT and COLD binding are also evident when looking at 
overlap with chromatin state: HOT regions are especially enriched 
for binding in RED and YELLOW (i.e., active) chromatin states, and 
depleted in all three inactive/repressed chromatin states, particu- 
larly BLACK and GREEN chromatin (Fig. 4B). Additionally, and 
consistent with previous reports, we found TRFs' DNA motifs tend 
to be enriched binding sites that fall within COLD and/ or WARM 
regions, and depleted in binding sites that fall within HOT regions 
(Supplemental Fig. S3). 

We next asked whether there are differences in DNA conser- 
vation across the range of TRF occupancies. Our measure of DNA 
conservation in this case is its phastCons score, which measures 
the probability that an individual base is part of a stretch of base 
pairs (usually —100-1000 bp in length) that is conserved. One 
implication from such analysis is that DNA with a higher phast- 
Cons score is under purifying selection and more likely to be 
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Figure 3. GAGA factor binding at HOT regions. (A) TRFs with data sets highly enriched for binding at HOT regions (Z-score >50). Only Trithorax-like/ 
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functional. Mean phastCons values were calculated for the 
center ±1 kb for all HOT, WARM, or COLD regions (Fig. 4C). In- 
triguingly, the patterns of conservation at HOT regions deviate 



from those at the lower occupancy regions. Generally speaking, 
the central 500 bp of COLD and WARM regions is more conserved 
than the central 500 bp of HOT regions (Fig. 4C). In addition, 
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gions described in Supplemental Figure SI . (B) Fraction of HOT, WARM, 
and COLD binding across the chromatin states described in Figure 1 . (C) 
Mean phastCons score (1 0-bp window) across HOT, WARM, and COLD 
regions (center of region ±1 kb). (D) Mean phastCons score ± confidence 
interval (95%) for the central 1 00 bp of HOT, WARM, and COLD regions. 
Asterisks represent significant divergence from random ([**]/>< 1 x 10~ 15 ; 
Wilcoxon rank sum test). 



whereas the lower occupancy regions display a distinct pattern of 
increased conservation near their centers, the HOT regions actually 
have the opposite pattern, with less conservation near their centers 
relative to the distal edges. This pattern at HOT regions is due to 
their tendency to fall at promoter regions, with the increased distal 
conservation reflective of nearby coding regions. Focusing on the 
central 100 bp of HOT, WARM, and COLD regions reveals that 
HOT regions are significantly less conserved than WARM/COLD 
regions (Fig. 4D). Interestingly, promoter-proximal HOT regions 
drive this pattern, as promoter-distal HOT regions have a conser- 
vation pattern similar to WARM and COLD regions (Supplemental 
Fig. S4A). In fact, promoters associated with HOT regions are 
significantly less conserved than promoters that do not overlap 
HOT regions (Supplemental Fig. S4B). Thus, although many HOT 
regions are less evolutionarily constrained, it is clear that the pro- 
moter-distal HOT regions and non-HOT binding regions are cen- 
tered on domains of increased DNA conservation, suggesting that 
these regions are likely to be functional. 

With regard to target genes, HOT regions are often associated 
with genes that are highly and ubiquitously expressed house- 
keeping genes, consistent with the gene classes that fall within the 
YELLOW chromatin state. This is clear from gene ontology (GO) 
analysis of the genes associated with HOT regions: Categories such 
as 'metabolic process' and 'cellular component biogenesis' are 
highly significant (both P < 10~ 15 ). Additionally, non-housekeep- 
ing categories including 'developmental process' and 'transcrip- 
tion regulator activity' are also enriched (both P < 10~ 25 ) among 
HOT target genes. However, comparing the gene sets targeted by 
HOT, WARM, and COLD regions becomes complicated because 
many loci are associated with more than one type of binding 
event. Thus, we instead focused on genes targeted by various 
combinations of HOT and COLD binding to explore potential 
differences in regulatory logic across unique gene sets. We broke 
genes down into loci associated only with a HOT binding region, 
loci associated with both HOT and COLD binding, and loci asso- 
ciated only with COLD binding. GO analysis of these gene cate- 
gories revealed an interesting pattern for genes associated with 
HOT regions (Fig. 5 A). Genes with HOT input but no COLD input 
are enriched for housekeeping categories (e.g., cellular protein 
metabolic process, cellular protein catabolic process). Genes re- 
ceiving both HOT and COLD regulatory input, on the other hand, 
are enriched for categories associated with transcription and de- 
velopmental patterning, morphogenesis, and differentiation (e.g., 
cell-fate specification, organ morphogenesis, neuron development, 
regulation of transcription). Together these results suggest that 
both ubiquitous housekeeping genes and highly regulated devel- 
opmental genes are associated with regions of high TRF co-occu- 
pancy; developmental genes, however, require more specific TRF 
inputs as well, likely to provide the combinatorial regulatory in- 
puts that are necessary to drive precise patterns of gene expression. 

Multiple results described thus far are consistent with a model 
in which lower occupancy binding regions (COLD and WARM) 
represent the traditional developmentally regulated enhancer — these 
regions are conserved, often distal to the promoter, and tend 
to be associated with genes involved in developmental pattern- 
ing. However, HOT regions can be associated with these pat- 
terning genes but are also associated with ubiquitous genes and 
tend to be promoter-proximal. It is important to point out that 
HOT regions are capable of driving patterned expression, as was 
recently demonstrated by Kvon et al. (2012) in a study of 108 HOT 
regions. However, the DNA regions tested in the aforementioned 
study tended to be near developmentally regulated genes and 
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located in BLUE or RED, rather than YELLOW, chromatin (Sup- 
plemental Fig. S5). Thus, although this study unequivocally 
demonstrated that HOT regions are able to drive patterned gene 
expression, it may not be representative of HOT regions as a whole. 
For this reason, we chose to explore the relationship between TRF 
occupancy and regulatory activity on a global scale. We asked 
whether enhancers genome wide are more likely to be associated 
with high-, medium-, or low-occupancy target regions. To address 
this question, we compared the genome-wide binding data with 
a recently published genome-wide assessment of Drosophila en- 
hancer activity, in which STARR-seq (self -transcribing active reg- 
ulatory region sequencing) was used to identify two primary types 
of enhancers (Arnold et al. 2013). The first enhancer class, 'open' 
enhancers, overlaps accessible (DNase I hypersensitive) DNA re- 
gions; the second class of enhancers, 'closed' enhancers, does not 
overlap accessible DNA and appears to be epigenetically regulated 
at the chromatin level (Arnold et al. 2013). Overall, 2211 HOT 
regions overlap an enhancer, 2089 WARM regions overlap an en- 
hancer, and 1617 COLD regions overlap an enhancer. A compari- 
son of HOT, WARM, and COLD regions to the two classes of en- 
hancers across S2 and OSC cells yielded interesting patterns of 
enrichment: in both cell types, HOT regions are significantly 
enriched for binding to open enhancers, whereas COLD regions 
are significantly enriched for binding to closed enhancers (Fig. 
5B,C). Warm regions are enriched for binding both classes. Further, 
HOT regions are significantly more likely to occupy enhancers that 
are active across both cell types (Fig. 5D). Thus, all classes of 
binding are likely to overlap DNA regions with enhancer activity 
but, consistent with the chromatin state overlap, HOT regions are 
more likely to be associated with open enhancers used across cell 
types, whereas lower occupancy regions tend to be associated with 
complex enhancers that are also regulated at the epigenetic level. 

Discussion 

In this study we provide analysis of new and previously published 
ChIP data, in combination with published chromatin state and 
genome-wide enhancer characterization data, to provide a window 
into the principles governing genomic gene regulatory networks 
and TRF-DNA interactions. 

From a gene regulatory network perspective, this work high- 
lights numerous interactions that may warrant further explora- 
tion. Thousands of TRF-DNA interactions are observed within 
these data, with 5823 binding events overlapping DNA regions 
that act as transcriptional enhancers (Arnold et al. 2013; Supple- 
mental Table S5). A comparison of the TRF-DNA interactions 
across this diverse set of transcriptional regulators has identified 
numerous cobinding events that highlight direct or indirect in- 
teractions between TRFs on the same DNA regions. Multiple fac- 
tors known to physically interact (EcR-USP, Pan-Gro, Yki-Trl) are 
identified as significantly colocalized on DNA across the genome 
(Yao et al. 1992; Cavallo et al. 1998; Oh et al. 2013), suggesting that 
additional colocalization relationships (e.g., Homothorax and 
Groucho) may represent functional interactions worth exploring 
in greater detail. 

A broad range of binding strategies and preferences are clear 
from the TRFs analyzed in these data. Although a majority of the 
factors tend to bind proximal promoter regions, many important 
developmental regulators are significantly bound to intronic and 
promoter-distal intergenic regions, likely representing targeting of 
enhancers controlling genes subject to complex developmentally 
regulated transcriptional controls. TRFs also differ in their associ- 



ation with various types of chromatin. Most TRFs preferentially 
bind one of the two 'active' chromatin states (RED, YELLOW), 
consistent with the overall accessibility of these chromatin types 
and their association with expressed genes. 

Not all TRFs bind primarily in the active chromatin states, 
however. For example, a number of TRFs (Fig. IB) are often bound 
to genomic regions that fall into the BLUE chromatin state. BLUE 
chromatin is a Pc-targeted chromatin and is associated with the 
H3K27me3 repressive histone modification (Simon and Kingston 
2009; Filion et al. 2010; van Steensel 2011). Interestingly, the fac- 
tors that favor Pc-targeted chromatin tend to be Pc targets them- 
selves. Indeed, there are 20 data sets, representing 17 TRFs, in 
which >35% of binding events overlap BLUE/Pc chromatin; 14 of 
these TRFs are also targeted by Pc as evidenced by the fact that their 
genie regions overlap BLUE chromatin (and based on published 
data). The three exceptions are Pel, Til, and Stat92E. Pel (Polycomb- 
like) is a Polycomb group protein that has been identified as Pc 
targeted in another genome-wide study that covered multiple 
stages of development (Kwong et al. 2008; Filion et al. 2010). Til 
has been previously characterized as a Pc target, though Pc binding 
is developmentally transient and was likely missed by the Kc cell- 
based chromatin state classifiers for this reason (Kwong et al. 
2008). And Stat92E is the terminal transcription factor of the JAK/ 
STAT signaling pathway (Yan et al. 1996). Although Stat92E does 
not appear to be regulated by Pc, the three genes encoding the 
Unpaired (Upd) family of JAK/STAT pathway ligands — os (out- 
stretched, also known as unpaired), upd2, and upd3 — fall within 
a large domain of BLUE chromatin and are confirmed Pc targets 
(Classen et al. 2009). Ligand-mediated activation of JAK/STAT by 
Upd is the rate-limiting step dictating Stat92E activity, so this is 
another case in which a transcription factor's activity is regulated 
by Pc (Classen et al. 2009). Thus, all of the factors that significantly 
bind to Pc-targeted DNA regions are in turn regulated by Pc. This 
strategy, in which both TRFs and their targets are subject to heri- 
table epigenetic control, highlights a strict multi-tiered mecha- 
nism that can be used to ensure precise and reproducible de- 
velopment of a multicellular organism. The work presented here, 
although it does not exhaust the repertoire of Pc-regulated genes, 
further generalizes this view and underscores the importance of 
PcG regulation for ensuring the somatically heritable, high-fidelity 
maintenance of the spatially restricted patterns of expression of 
such developmentally important transcriptional regulators. 

Exploration of the patterns of TRF occupancy across the ge- 
nome revealed thousands of regions bound by >14 TRFs. It has 
previously been shown that the D. melanogaster, Caenorhabditis 
elegans, and human genomes all contain regions of DNA that are 
bound by numerous, often unrelated, transcriptional regulators 
(Moorman et al. 2006; Negre et al. 2011; Niu et al. 2011; Yip et al. 
2012). These "HOT" regions often do not contain the expected 
DNA motifs for the bound TRFs and binding may be mediated in 
part via protein-protein interactions (Moorman et al. 2006; Kvon 
et al. 2012). Although this widespread, possibly indirect, binding 
has led to the suggestion that HOT regions might drive ubiquitous 
expression, in Drosophila HOT regions can drive patterned gene 
expression (Kvon et al. 2012). Thus, in terms of ris-regulatory ac- 
tivity, HOT regions are sometimes similar to traditional enhancers. 
Nevertheless, we have identified multiple properties of HOT re- 
gions that distinguish them from lower-occupancy TRF-binding 
regions. Relative to lower-occupancy target regions, HOT regions 
are more likely to occur in proximal promoter regions, and more 
likely to fall in YELLOW and RED chromatin regions, and more 
likely to be associated with highly expressed housekeeping genes. 
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Overall, this indicates HOT regions are generally associated with 
the highly accessible DNA found at the promoters of housekeeping 
genes. However, a subset of HOT regions is associated with devel- 
opmental genes, and these genes often have additional non-HOT 
regulatory inputs. 

Consistent with the overrepresentation of GAGA motifs in 
HOT regions (Fig. 3; Kvon et al. 2012), we find binding of Trl to be 
highly enriched in HOT regions across multiple developmental 
contexts, suggesting that Trl may play an important role in 
maintaining HOT regions or influencing the regulatory output of 
HOT regions. Trl plays a role in directing nucleosome turnover and 
is associated with regions of low-nucleosome occupancy (Petesch 
and Lis 2008; Deal et al. 2010; Li and Gilmour 2013) and interacts 
with the FACT and NURF chromatin remodeling complexes (Xiao 
et al. 2001; Shimojima et al. 2003), putting it in a position to 
maintain the accessibility of HOT regions. Additionally, the GAGA 
motif has been associated with paused RNA polymerase II (Pol II), 
and Trl has recently been shown to recruit NELF (negative elon- 
gation factor) to promoters, putting it in a position to modulate 
the release of paused Pol II (Hendrix et al. 2008; Lee et al. 2008; 
Gilchrist et al. 2010; Fay et al. 2011; Li et al. 2013). Thus, through 
interaction with NELF, Trl is also in a position to directly regulate 
gene expression at promoter-proximal HOT regions. 

Surprisingly, HOT regions are generally less evolutionarily 
constrained than lower-occupancy TRF-binding regions. Across 
their central 100 bp, COLD and WARM regions are significantly 
more conserved than HOT regions. There are at least two possi- 
bilities that could explain this finding. One possibility is that HOT 
regions have a locally elevated mutation rate. Indeed, we see 
a pattern similar to the phastCons pattern when looking at SNP 
density across D. melanogaster populations: SNP density in HOT 
regions is significantly higher than in lower-occupancy regions 
(Supplemental Fig. S6). Consistent with the promoter bias of HOT 
regions, some evidence suggests that SNP density increases in the 
proximal promoter in D. melanogaster and humans and, addi- 
tionally, SNPs in the proximal promoter show a biased signature of 
transversions in humans (Guo and Jamison 2005; Main et al. 
2013). Perhaps the accessibility of DNA in promoter-proximal 
HOT regions leads to increased exposure to insults that cause 
mutation. Alternatively, much of the binding at HOT regions ap- 
pears to be functionally neutral and possibly indirect (Moorman 
et al. 2006; Kvon et al. 2012); thus, the mode of binding at HOT 
regions may allow for rapid DNA sequence turnover or insertions/ 
deletions as long as accessibility is maintained. Conversely, DNA 
motif sequence and the spacing between motifs are functionally 
constrained at many, but not all, ds-regulatory modules, and this 
too is the case with lower-occupancy TRF-binding regions (Erives 
and Levine 2004; Arnosti and Kulkarni 2005; Borok et al. 2010; 
Swanson et al. 2010; Evans et al. 2012). This, combined with the 
fact that WARM and COLD regions are more likely to fall distal to 
the promoter, suggests that lower-occupancy TRF regions often 
represent traditional enhancers. 

Despite their differences, however, HOT, WARM, and COLD 
regions are all significantly enriched for binding DNA with en- 
hancer activity. But once again the evidence suggests that the type 
of enhancers targeted differ across the occupancy groups. HOT 
regions are more likely to occur in highly accessible 'open' en- 
hancers that direct gene expression in a context-independent 
fashion (at least across S2 and OSC cell lines), whereas lower- 
occupancy regions are more likely to target less accessible en- 
hancers that tend to be further regulated at the epigenetic level. In 
combination with the results described above, this provides an- 



other piece of evidence that lower-occupancy regions represent 
traditional enhancers, which tend to be subject to more complex 
spatial and developmental regulation, while HOT regions repre- 
sent DNA regions with context-independent, and possibly less 
complex, regulatory functions. 

Methods 

Chromatin immunoprecipitation 

Chromatin collection and chromatin immunoprecipitation were 
performed as described previously (Negre et al. 2011). Transgenic 
lines containing GFP-tagged transcription factors within their 
endogenous genomic contexts were produced using the P[acman] 
bacterial artificial chromosome (BAC) system as previously de- 
scribed (Venken et al. 2009; Roy et al. 2010). Antibody details are 
available at modMine (http://intermine.modencode.org). A num- 
ber of antibodies were generous contributions from members of 
the Drosophila research community: Ken Cadigan (Pan/TCF), Andy 
Dingwall (Cmi/Lpt), Eric Lai (Insv), Erika Bach (Stat92E), Jim 
Kadonaga (TBP), Ken Irvine (Yki), Claude Desplan (Prd), Stephen 
Crews (Sc), Sean Carroll (Dll), Richard Mann (Hth), and Scott 
Hawley (Trem). Immunoprecipitated DNA was prepared for Illu- 
mina sequencing either as described in Negre et al. (2011) or using 
the Epicentre Nextera DNA Sample Preparation Kit. Briefly, Nextera 
library preparations were performed using the High Molecular 
Weight tagmentation buffer, and tagmented DNA was amplified 
using 12 cycles of PCR. DNA was then sequenced on an Illumina 
HiSeq 2000 according to the manufacturer's standard protocols. 

Data processing 

ChlP-chip peak calls are as previously described (Li et al. 2008; 
MacArthur et al. 2009; Zinzen et al. 2009; Negre et al. 2011). For 
ChlP-seq experiments, biological replicates were scored against an 
appropriate input DNA control (from non-immunoprecipitated 
chromatin). The MACS (v2) peak caller was used to identify and 
score (rank) potential binding sites/peaks (Zhang et al. 2008). For 
obtaining optimal thresholds, we used the irreproducible discov- 
ery rate (IDR) framework to determine high-confidence binding 
events by leveraging the reproducibility and rank consistency of 
peak identifications across replicate experiments of a data set (Li 
et al. 2011a). Briefly, for individual replicates, peaks were called 
using MACS2 with a P-value threshold of 1 X 10" 3 to obtain 
a maximum of 30-k peaks, and peaks were ranked according to 
their P-value scores. Replicates were then pooled and MACS2 was 
again used to call peaks at a P-value threshold of 1 x 10~ 3 . Peaks 
from the pooled set that overlapped at least one peak in both in- 
dividual replicate sets were retained. From this set of replicate-re- 
producible peaks, we obtained two independent rankings based on 
the P- values from each replicate; this pair of ranked lists was used as 
input for the IDR framework. Cross-replicate and pseudoreplicate 
rank thresholds at an IDR of 5% were generated, and the better of 
the two was used to generate the final set of rank consistent and 
reproducible peaks. Details of the IDR framework are available at 
https://sites.google.com/site/anshulkundaje/projects/idr. All data 
sets are available through the modENCODE Data Coordinating 
Center (DCC; see Supplemental Table SI for DCC submission IDs). 

Peak annotation and TRF-binding overlap 

ChIP peaks were annotated as overlapping genomic features 
according to the FlyBase r5.34 gene structure annotation and the 
following categories: promoter (transcription start site; within 1-kb 
upstream of a transcription start site or overlapping 5' UTR), cod- 
ing (coding sequence), downstream (transcription stop site; 3' UTR 
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or 200-bp downstream from gene), intron, intergenic regions. 
Peaks were assigned to target genes based on the nearest tran- 
scription start site (<10,000 bp). Peaks were assigned to chro- 
matin states based on their overlap with the five states defined 
in Filion et al. (2010), using the priority order of BLUE- 
>RED>YELLOW>GREEN>BLACK in cases where one peak over- 
lapped multiple chromatin types. 

To calculate the significance of cobinding for two TRFs, the 
occurrence of colocalization of each pair of ChIP peak sets was 
compared with a permutated background performed 10,000 times, 
and a Z-score was assigned to each pair to indicate whether the co- 
occurrence was significantly higher or lower than expectation (see 
http://www.encodestatistics.org/). Regions of significant cobinding — 
HOT regions — were defined following the algorithm described 
in Negre et al. (2011). Briefly, to establish HOT regions based on 
the colocalization of all 171 regulators on the D. melanogaster 
genome, we took centers of all peaks of all regulators to represent 
their genomic coordinates and calculated the density genome 
widely using 300-bp bandwidth Kernel Density Estimation. We 
then scanned the density scores peak wide and denoted each peak 
a HOT region candidate. The complexity (occupancy) of each 
HOT region candidate was calculated by summing the Gaussian 
kernalized distance from the peak to peaks of each other regulator 
that contributed at least 0.1 to this strength. Finally, we named 
these candidate regions as HOT if the complexity was >15, as 
COLD if the complexity was <3, and as WARM for all the rest. 
HOT, WARM, and COLD regions were assigned to genomic fea- 
tures, target genes, and chromatin states as described above for 
individual TRFs. Only level 5 Gene Ontology (GO) categories in 
which the enrichment P- value (Bonferroni corrected) was <1 X 
10~ 5 for at least one of the three categories (COLD only; HOT + 
COLD; HOT only) were used for GO category clustering (Fig. 5). 
The top motifs enriched in HOT regions (Fig. 5) were identified 
using FIMO (Grant et al. 2011). 

Data access 

Data from this study have been submitted to the NCBI Gene 
Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/), 
modMine (http://intermine.modencode.org), or both under GEO 
accession nos. GSE49768-GSE49780, GSE49899, or the modMINE 
DCC IDs 2627, 2629, 2630, 2633, 2634, 2636, 3234, 3240, 3390- 
3396, 3399-3401, 3403. See Supplemental Table SI for all data set 
submission information. 
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